53 research outputs found

    Design techniques for xilinx virtex FPGA configuration memory scrubbers

    Get PDF
    SRAM-based FPGAs are in-field reconfigurable an unlimited number of times. This characteristic, together with their high performance and high logic density, proves to be very convenient for a number of ground and space level applications. One drawback of this technology is that it is susceptible to ionizing radiation, and this sensitivity increases with technology scaling. This is a first order concern for applications in harsh radiation environments, and starts to be a concern for high reliability ground applications. Several techniques exist for coping with radiation effects at user application. In order to be effective they need to be complemented with configuration memory scrubbing, which allows error mitigation and prevents failures due to error accumulation. Depending on the radiation environment and on the system dependability requirements, the configuration scrubber design can become more or less complex. This paper classifies and presents current and novel design methodologies and architectures for SRAM-based FPGAs, and in particular for Xilinx Virtex-4QV/5QV, configuration memory scrubbers

    Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

    Get PDF
    Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation

    Ratio-based temperature-sensing technique hardened against nanometer process variations

    Full text link
    This letter presents a temperature-sensing technique on the basis of the temperature dependency of MOSFET leakage currents. To mitigate the effects of process variation, the ratio of two different leakage current measurements is calculated. Simulations show that this ratio is robust to process spread. The resulting sensor is quite small-0.0016 mm2 including an analog-to-digital conversion-and very energy efficient, consuming less than 640 pJ/conversion. After a two-point calibration, the accuracy in a range of 40°C-110°C is less than 1.5°C , which makes the technique suitable for thermal management applications

    FPGA Acceleration of Monte Carlo-based Financial Simulation: Design Challenges and Lessons Learnt

    Get PDF
    The simulation of interest rate derivatives is a powerful tool to face the current market fluctuations. However, the complexity of the financial models and the way they are processed require exorbitant computation times, what is in clear conflict with the need of a processing time as short as possible to operate in the financial market. To shorten the computation time of financial derivatives the use of hardware accelerators becomes a must

    An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithm

    Get PDF
    n this work we present an FPGA implementation of a single-precision °oating-point arith- metic powering unit. Our powering unit is based on an indirect method that transforms xy into a chain of operations involving a logarithm, a multiplication, an exponential function and dedicated logic for the case of a negative base. This approach allows to use the full input range for the base and exponent without limiting the range of the exponent as in direct methods. A tailored hardware implementation is exploited to increase the accuracy of the unit reducing the relative errors of the operations while high performance is obtained taking advantage of the FPGA capabilities for parallel architectures. A careful design of the pipeline stages of the involved operators allows a clock cycle of 201.3 MHz on a Xilinx Virtex-4 FPG

    High Performance FPGA-oriented mersenne twister uniform random number generator

    Get PDF
    Mersenne Twister (MT) uniform random number generators are key cores for hardware acceleration of Monte Carlo simulations. In this work, two different architectures are studied: besides the classical table-based architecture, a different architecture based on a circular buffer and especially targeting FPGAs is proposed. A 30% performance improvement has been obtained when compared to the fastest previous work. The applicability of the proposed MT architectures has been proven in a high performance Gaussian RNG

    On the Hardware Implementation of Triangle Traversal Algorithms for Graphics Processing

    Full text link
    Current GPU architectures provide impressive processing rates in graphical applications because of their specialized graphics pipeline. However, little attention has been paid to the analysis and study of different hardware architectures to implement specific pipeline stages. In this work we have identified one of the key stages in the graphics pipeline, the triangle traversal procedure, and we have implemented three different algorithms in hardware: bounding-box, zig-zag and Hilbert curve-based. The experimental results show that important area-performance trade-offs can be met when implementing key image processing algorithms in hardwar

    Using pMOS Pass-Gates to Boost SRAM Performance by Exploiting Strain Effects in Sub-20-nm FinFET Technologies

    Get PDF
    Strained fin is one of the techniques used to improve the devices as their size keeps reducing in new nanoscale nodes. In this paper, we use a predictive technology of 14 nm where pMOS mobility is significantly improved when those devices are built on top of long, uncut fins, while nMOS devices present the opposite behavior due to the combination of strains. We explore the possibility of boosting circuit performance in repetitive structures where long uncut fins can be exploited to increase fin strain impact. In particular, pMOS pass-gates are used in 6T complementary SRAM cells (CSRAM) with reinforced pull-ups. Those cells are simulated under process variability and compared to the regular SRAM. We show that when layout dependent effects are considered the CSRAM design provides 10% to 40% faster access time while keeping the same area, power, and stability than a regular 6T SRAM cell. The conclusions also apply to 8T SRAM cells. The CSRAM cell also presents increased reliability in technologies whose nMOS devices have more mismatch than pMOS transistors

    A Web-based Environment Providing Remote Access to FPGA Platforms for Teaching Digital Hardware Design

    Get PDF
    In this work we present the design and implementation of a Web-based application for remote access to the FPGA boards in a Digital Design Laboratory. It enables students from specialization courses to afford the design exercises at any place and time, even at home, just with an Internet access and a Web browser. At the same time, it opens the possibility of prototyping small designs to the rest of students which have no access rights to the physical Laboratory

    A 0.0016 mm(2) 0.64 nJ leakage-based CMOS temperature sensor

    Get PDF
    This paper presents a CMOS temperature sensor based on the thermal dependencies of the leakage currents targeting the 65 nm node. To compensate for the effect of process fluctuations, the proposed sensor realizes the ratio of two measures of the time it takes a capacitor to discharge through a transistor in the subthreshold regime. Furthermore, a novel charging mechanism for the capacitor is proposed to further increase the robustness against fabrication variability. The sensor, including digitization and interfacing, occupies 0.0016 mm2 and has an energy consumption of 47.7–633 pJ per sample. The resolution of the sensor is 0.28 °C, and the 3σ inaccuracy over the range 40–110 °C is 1.17 °C
    corecore